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Abstract —Composition is amongst the major challenges faced 
in language engineering. Erdweg et al. offered a taxonomy 
for language composition. Mernik catalogued the use of the 
Language Definitional Framework LISA for composition sorts 
in that taxonomy. We produce a similar catalogue for embedded 
language engineering in Scala. 

We begin with techniques that are not specific to Scala. They 
are applicable in any host language with a module system and 
support for higher order functions. We, then, present two more 
techniques to examine Scala-specific language engineering. Inter¬ 
estingly enough, even though dealing with embedded languages, 
in terms of lines of code, our material is of comparable length 
to its LISA counterpart. Our work lends insight into Scala’s 
serviceability for composition, as a host for embedded language 
engineering. 

I. Introduction 

a) Language composition is a piece of reality!: Every¬ 
day, there are new programming languages that are born by 
combining ideas from older languages. Inspiration aside, that 
is an act of composition in many cases. For example, roughly 
put, Scala adds functional programming and ML modules with 
mixin composition to Java; which, in return, is C++ without 
pointers; which, in return, is C with OOP. 

The taxonomy of Erdweg et al. [1] suggests a terminology 
and notations for describing such compositions. According to 
them, one can formalise our Scala description as: 

Scala ~ C < C++ > Java < (MLModule l±t Mixin) (1) 

b) Observations from Chemistry: Consider the reaction: 

H 2 S0 3 + 2 x NaOH —>2x H 2 0 + Na 2 S0 3 (2) 

In Chemistry, two key ingredients for success in the study of 
such equations are: (CIi) the availability of substances as the 
subjects of study, and, (CD knowledge about how to perform 
a desirable composition. In reaction (2), for instance, both 
substances H 2 S0 3 and NaOH need to be available. One also 
needs to know how to double NaOH for the equation balance 
to be right. Also, how to add NaOH to H 2 S0 3 (like the rate 
of addition, proper temperature, etc.) needs to be known. 

c) Programmatic Availability & Composition: The study 
of formulae like equation (1) determines the precise relative 
position of languages. Using the outcome, one would be able 
to add, for example, what is missing in equation (1) so that 
the can be replaced by an “=”. One would also gather 


that the left-out “FPl±J” is necessary right before MLModule 
for the balance to be right. Such manipulations are similar 
to adjusting coefficients in reaction (2) to obtain a balance. 
Similar to Chemistry, two key ingredients become noticeable 
here: (PLD programmatic availability of programming lan¬ 
guages themselves and their belongings as the subjects of 
study, and, (PLD knowledge about how to programmatically 
obtain desirable language compositions. 

By the time of this writing, (mainstream) languages are 
next to inaccessible as programmatic entities. The study of 
programmatic language composition, nonetheless, can be con¬ 
ducted independently using, say, contrived languages. That is 
how this paper tries to gain (PLI 2 ). 

d) Contributions: We demonstrate three techniques for 
composing languages embedded in Scala. The first (Section II) 
is applicable in any host language with a module system and 
support for higher order functions. The second (Section III) is 
based on Lightweight Modular Staging (LMS) [2], And, the 
third - which is also a new solution to the Expression Problem 
(EP) [3], [4], [5] - employs (possibly restricted) abstract types. 
The trick in our third technique is promoting the cases of Alge¬ 
braic Data Types (ADTs) into their own ADT-parameterised 
standalone components. We showcase each technique using 
the example compositions of Mernik [6]. We, then, compare 
the three techniques for their success in addressing the EP 
concerns (Section V). A discussion about the related work 
also comes at Section VI. 

e) Coding Conventions: This paper assumes familiarity 
with Scala. For each showcase, the syntax and semantics 
come in separate packings called syntax and semantics, 
respectively. Due to space restrictions, in our code, the name 
of the showcase is only appended as a comment to the end 
of the first line of the respective syntax or semantics. 
For the same reason, our code is also otherwise unusually 
compressed. Whilst the showcases are referred to in the prose 
in CamlCase, their respective Scala package (containing the 
showcase’s syntax and semantics) is named like_this 
or abbreviated as It. 

II. Scala-Unspecific 

Erdweg et al. catalogue five different ways languages can be 
composed: language extension, language restriction, language 
unification, self-extension, and extension composition. Mernik 
offers simple DSLs to showcase those ways in LISA [7]. In 


this section, we employ Mernik’s simple DSLs for the same 
purpose, albeit in Scala. 

A. Language Extension 

A base language B is said to be extended to a language 
E when the description of B is amended with a description 
fragment to get E. Erdweg et al. denote that by /i <] E. 
Consider the language Robot below (packaged under the name 
robot in Scala) for a robot arm that takes commands for 
moving one unit to either of the four 2D directions. The 
semantics of Robot involves updating the arm’s position 
(recorded in terms of the x and y coordinates) based on the 
commands (lines 11 to 16). 

1 object syntax {//robot 

2 class Command 

3 case object Left extends Command 

4 case object Right extends Command 

5 case object Up extends Command 

6 case object Down extends Command 

7 case class Commands(s: Seq[Command])} 

8 object semantics {import syntax._//robot 

9 class Position (var x: Int, var y: Int) 

10 object position extends Position(0, 0) 

11 def locate: Command => Unit = { 

12 case Left => position.x -= 1 

13 case Right => position.x += 1 

14 case Up => position.y += 1 

15 case Down => position.y -= 1} 

16 def locate (cs: Commands) = cs.s.foreach(locate) 

17 } 

Robot is extended to RobotTime (the robot_time package) 
by adding to the semantics, i.e.. Robot <1 RobotTime: 

1 package robot_time 

2 import robot. import syntax._ 

3 def time(cs: Commands): Int = cs.s.length 

Assuming that executing each command takes one time unit, 
the total time required for a set of commands is the size of 
the set. The method time in line 3 above adds that piece of 
semantics to Robot to get RobotTime. whereas Commands ( 
Right, Down, Down) in Robot has only got the semantics 
x = 1, y = — 2, it also has the semantics t = 3 in RobotTime. 
(The coordinates are obtained by locate in line 16 of robot 
and the timing by line 3 of robot_time.) 

Here is a difference between our implementation of Robot- 
Time and that of Mernik: The latter is done in LISA: a 
Language Definitional Framework (LDF) that combines OOP 
with Attribute Grammars (AGs) [8], [9], As such, LISA’s 
counterpart for time has to visit all the grammatical rules 
in Robot to attribute the new piece of semantics to them. On 
the contrary, Scala gave us the joy of simply equating time by 
the number of the commands, regardless of the grammatical 
rules involved. 

B. Language Restriction 

A base language B is said to be restricted to a language 
R when certain parts of the B’ s features are removed upon 
transition to R. This is denoted by B > /?. A typical usage 
of that is when a language is narrowed to a core of it. That is, 
certain parts of the base syntax are cancelled into combinations 
of other base syntactic parts that are deemed to be equivalent. 


For example, both GpH [10] and Utrecht Haskell [11] are 
developed like that. 

The language RobotPositive below (packaged under 
robot_positive) restricts Robot to only Up and Right 

commands. (Technically, the object syntax below is not 
required. Yet, we retain it for completeness.) 

object syntax {//robot_positive 

import robot.syntax.{Right, Up, Commands}} 

object semantics {//robot_positive 

import robot.syntax.{Right, Up, Commands} 
import robot.semantics.position 
def locate (cs: Commands) { 
for(c <- cs.s) c match { 

case Right => position.x += 1 
case Up => position.y += 1 

} 

} 

} 

Any attempt to use the expression in the previous section 
under RobotPositive will fail to compile for the availability of 
Down in it, which is absent in RobotPositive. (Of course, one 
can still access robot. syntax. Down in RobotPositive. But, 
we are only concerned about unqualified access here.) On the 
other hand, Commands (Right, Up, Up) has the semantics 
x = 1, y = 2 under RobotPositive. 

C. Language Unification 

Erdweg et al. say two languages Li and L 2 are unified 
to L when both L\ and L 2 make sense independently from 
one another and from L (as the composition’s outcome). 
Furthermore, in L, neither L\ nor L 2 should be dominated 
by the other so that a concept of equity prevails in the 
composition. The notation is L = Li l±) g L 2 , where g is the 
so-called glue code required for the composition. 

Having seen the language Robot, we now consider the 
language ExprAdd (packaged under expr_add): a simple 
ADT with two cases for natural numbers and addition. 

object syntax {//expr_add 

class Expr {//Expr ::= Expr + Term | ... 

def + (t: Term): Expr = Add(this, t) 

} 

class Term extends Expr//Expr ::= ... | Term 

//Term ::= n 

case class Num(n: Int) extends Term 

case class Add(left: Expr, right: Term) extends Expr 

} 

object semantics {//expr_add 

import syntax._ 
def value: Expr => Int = { 
case Num(n) => n 

case Add(e, t) => value(e) + value(t) 

} 

} 

Using value in line 12 above, one obtains the semantics 
5, 12, and 6 for the expressions Num (5) , Num (10) + Hum (2) , 
and Num (1) + Num (2) + Num (3), respectively. 

The language RobotUniExprAdd below (packaged under 
robot_uni_expr_add) unifies Robot and ExprAdd by 
allowing the robot arm to take commands for moving 
as many units to either of the four directions as the 
corresponding ExprAdd argument evaluates to. As such, 
Commands(Right(Num(5)), Up(Num(2)+ Num(10)), 
Up(Num(2)+ Num(2)+ Num(2)), Down(Num (4))) has 
the semantics x = 5, y = 14. Check locate in line 17 below. 
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1 object syntax {//robot_uni_expr_add 

2 import robot.syntax.Command 

3 import expr_add.syntax 

4 

5 case class Left(e: Expr) extends Command 

6 case class Right(e: Expr) extends Command 

7 case class Up(e: Expr) extends Command 

8 case class Down(e: Expr) extends Command 

9 } 

10 object semantics {//robot_uni_expr_add 

11 import robot.syntax.Commands 

12 import robot_uni_expr_add.syntax 

13 

14 import robot.semantics.position 

15 import expr_add.semantics 

16 

17 def locate (cs: Commands) { 

18 for (c <- cs.s) c match { 

19 case Left(e) => position.x -= value(e) 

20 case Right(e) => position.x += value(e) 

21 case Up(e) => position.y += value(e) 

22 case Down(e) => position.y -= value(e) 

23 } 

24 } 

25 } 

D. Self Extension 

This is the situation when the description of a language L 
itself is used for extending it. Typically, embedded DSLs self- 
extend their host language. For example, all the languages we 
present in this paper self-extend Scala. 

Like Mernik, we believe that demonstrating self extension 
takes much more than the volume of a single research paper. 
This is because bootstrapping a language L to the level where 
it can handle self extension is already more involved than that 
volume. Hence, we too drop demonstration of self extension. 

E. Extension Composition 

Extension composition is when (both or at least one of) the 
language descriptions that are to be composed are themselves 
compositions of other language descriptions. As such, exten¬ 
sion composition can be regarded as higher order composition. 
Six combinations of extension and unification are possible 
(three distinguished by Mernik): 

1) Double-Unification (l±ll±l): Li tt) fl (L 2 tt )h L 3 ). 

2) Double-Extension (<]<]): B <\ E\ <\ E 2 . 

3) Extension by a Unification (< (tt))): B <\ ( L\ tt) L 2 ). 

4) Extension of a Unification ((tt)) <0: (Li tt) L 2 ) <\ E. 

5) Unification with an Extension ({tt), (<)}): Ltt) (B < E) 
or (f? <1 E) tt) L. Note the symmetry. 

We now consider each combination. 

1) Double-Unification (ttltt)): To that end, we begin by 
presenting Mernik’s language Dec (packaged under dec) in 
Scala. Dec enables the programmer to bind a set of variables 
to integer constants. 

1 object syntax {//dec 

2 case class ConstDefList(ds: Map[String, Int]) } 

Unsurprisingly, the (Scala-automatic) semantics of ConstD¬ 
efList ("a" -> 5, "b" -> 10) is then {a 5, b 10}. 

With that, we illustrate the first class of Mernik’s exten¬ 
sion compositions using RobotUniExprAddUniDec (packaged 
under rueaud). As suggested by its name, this language is 


(Robot l±l ExprAdd) tt) Dec. The Robot l±) Expr Add portion is al¬ 
ready presented. See robot_uni_expr_add in Section II-C. 
We now show how to obtain the remaining unification. 

import expr_add.syntax.{Expr, Term} 
object syntax {//rueaud 

import robot.syntax.Commands; import dec.syntax._ 
implicit class CDLInCs (val cdl:ConstDefList) { 

def in (s: Commands) = { 

consts = cdl.ds; new EnvComm(cdl.ds, s) 

} 

} 

class EnvComm (val ds: Map[String, Int], 
val cs: Commands) 

var consts: Map[String, Int] = Map() 
case class Var(n: String) extends Term} 
object semantics {//rueaud 

import syntax._; import robot_uni_expr_add.syntax._ 
import r obot.s emantic s._ 

def value_ext: (Expr, Expr => Int) => Int = { 
case (Var(n), c) => consts(n) 
case (e, c) => 

expr_add.ext_semantics.value_ext(e, c)} 
def value(e: Expr): Int = value_ext(e, value) 
def locate (r: EnvComm) { 
r.cs.s.foreach { 

case Left(e) => position.x -= value(e) 
case Right(e) => position.x += value (e) 
case Up(e) => position.y += value(e) 
case Down(e) => position.y -= value(e) 

} 

} 

rueaud. syntax aims at reusing the former language 
descriptions as they are. To that end, it takes a pimp my 
library approach [12] on trying to implicitly (lines 4 to 
10 above) give instances of dec. ConstDefList the extra 
feature of being followed by commands possibly referring to 
the declarations. Such declarations followed by expressions 
are then instances of EnvComm. The variable consts (line 11) 
is where the processed declarations are stored. The new ADT 
case Var (line 12) is for looking up the value a name is bound 
to. rueaud legitimises commands for moving the robot arm 
as many units as a pertaining expression evaluates to (lines 23 
to 26). Note that, because of Var, those expressions can refer 
to declarations as well. All that together gives ConstDefList 
("a" -> 5, "b" -> 10)in Commands(Right(Var(" 

a"))» Up(Num(2)+ Var("b")), Down(Num(4))) the 
semantics x = 5, y = 8 in RobotUniExprAddUniDec. 

Instead of reusing expr_add. semantics .value, the 
rueaud. semantics. value method uses the method 
expr_add. ext_semantics . value_ext, which will be 
explained shortly. This is because the former is closed on 
the set of ADT cases it can handle. Hence, we resort to the 
following extensible semantics of ExprAdd: 


object ext_semantics { 


import syntax._ 


def value_ext: (Expr, 

Expr => Int) => Int = { 

case (Num(n), c) 

=> n 

case (Add(e, t), c) 

=> 0 (e) + c (t) } 

def value(e: Expr): Int = value_ext(e, value) 

} 


In the fashion of cTC'o [13], value_ext above takes a 
continuation argument c (line 3), which caters postponing 
the closing time until the appropriately complete shape [14] 
of the ADT is known (line 6 above for expr_add and 
line 20 for rueaud). As such, extending RobotUniExprAdd 
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to RobotUniExprAddUniDec here involves manipulating the 
former. See Section V for more. 

2) Double-Extension (<l<l): The idea in RobotTimeSpeed 
below (packaged under robot_time_speed) is to enable 
the user to instruct the robot arm with the speed for its 
subsequent moves, until further notice. It adds a pertaining 
command to RobotTime to obtain Robot <3 RobotTime <3 
RobotTimeSpeed. 

1 object syntax {//robot_time_speed 

2 import robot.syntax.Command 

3 case class Speed(i: Int) extends Command 

4 } 

5 object semantics {//robot__time_speed 

6 import syntax. import robot.syntax.{Command, Commands} 

7 import robot.semantics.position 

8 def locate: Command => Unit = { 

9 case Speed(_) => {} 

10 case c => robot.semantics.locate(c)} 

11 def locate(cs: Commands) = cs.s.foreach(locate) 

12 var speed: Double = 1.0 

13 def time(cs: Commands) : Double = { 

14 var sum: Double = 0.0 

15 for(c <- cs.s) c match { 

16 case Speed (i) => speed = i 

17 case _ => sum += (1.0 / speed) 

18 } 

19 sum 

20 } 

21 } 

The new command for altering speed is Speed in line 3 
above. This new command has no impact on the arm’s 
position, as manifested in line 9. It is in the time calculation 
where, once used, the related variable (i.e., speed in line 12) 
is updated accordingly (line 16) and taken into consideration 
for subsequent commands (line 17). Commands (Up, Speed 
(2), Right, Left) has the semantics x = l,y = 0,t = 2 
in RobotTimeSpeed. 

3) Extension by a Unification (< (l±))j; We now demon¬ 
strate RobotExtExprAddUniDec = Robot <3 (ExprAddl+JDec). 
We begin by ExprAddUniDec (packaged under eaud): 

1 import expr_add.syntax._ 

2 object syntax {//eaud 

3 import dec.syntax._ 

4 class EnvExpr (val ds: Map[String, Int], val e: Expr) 

5 implicit class CDL2CDLInE (val cdl: ConstDefList) { 

6 in (e: Expr) = { 

7 consts = cdl.ds 

8 new EnvExpr(cdl.ds, e) 

9 } 

10 } 

11 var consts: Map[String, Int] = Map() 

12 case class Var(n: String) extends Term 

13 } 

14 object semantics {//eaud 

15 import syntax._ 

16 import dec.syntax._ 

17 def value_ext: (Expr, Expr => Int) => Int = { 

18 case (Var(n), c) => consts (n) 

19 case (e, c) => expr_add.ext_semantics.value_ext(e, c) 

20 } 

21 def value(e: Expr): Int = value_ext(e, value) 

22 def value(ee: EnvExpr): Int = value(ee.e) 

23 } 

eaud is similar to rueaud in Section II-E1 and we drop fur¬ 
ther explanation. RobotExtExprAddUniDec below (packaged 
under reeaud) tries to make use of eaud. 

1 object syntax {//reeaud 

2 import dec.syntax import robot.syntax.Commands 

3 class EnvComm(val ds: Map[String, Int], 


val cs: Commands) 

implicit class CDL2CDLInC (val cdl: ConstDefList) { 

def in (s: Commands) = { 
consts = cdl.ds 
new EnvComm(cdl.ds, s) 

} 

} 

var consts = eaud.syntax.consts 

} 

object semantics {//reeaud 

import robot.semantics.position 
import robot_uni_expr_add.syntax._ 
import eaud.semantics.value; import syntax._ 
def locate(r: EnvComm) { 
r.cs.s.foreach { 

case Left(e) => position.x -= value(e) 

case Right(e) => position.x += value(e) 
case Up(e) => position.y += value (e) 

case Down(e) => position.y -= value(e) 



} 

Here are the few idiosyncrasies of reeaud: Firstly, reeaud 
fails to reuse most of the syntactic facilities of eaud. 
This is because the former employs declarations followed 
by commands, whereas the latter employs declarations fol¬ 
lowed by expressions. In line 11, nevertheless, consts is 
reused. Secondly, even though RobotExtExprAddUniDec = 
Robot <1 ..., in reeaud. semantics, we do not reuse 
robot. syntax. On the contrary, in line 15, it reuses the 
syntax of robot_uni_expr_add (for RobotUniExprAdd). 
This is because, in Robot, it is only possible to move the 
arm one unit to either direction. The Scala syntax for those 
two pieces of (embedded) syntax cannot coexist side by side. 
See Section III-A2 for more. 

reeaud. semantics . locate is similar to rueaud. sem¬ 
antics . locate. In RobotExtExprAddUniDec, 

ConstDefList ("a" -> 5, "b" -> 10)in Commands! 

Right(Var("a")), Up(Num(2)+ Var("b")), Down( 
Num (4) ) ) has semantics x = 5, y = 8. 

As pointed out by Mernik, so long as functional¬ 
ity is the only concern, RobotUniExprAddUniDec = 
RobotExtExprAddUniDec. The difference, both in LISA and 
Scala, is in the language descriptions, and the combinations by 
which they are obtained. Unlike its LISA counterpart, nonethe¬ 
less, obtaining RobotExtExprAddUniDec in Scala involves 
intermediate material that is not reused in the final product. 

4) Extension of a Unification f(l±l) <3); RobotUniExprAd- 
dExtRobotTime below (packaged under rueaert) extends 
RobotUniExprAdd (Section II) by a timing facility. The time 
required for carrying out a command of moving in one direc¬ 
tion equals what the pertaining expression evaluates to (lines 9 
to 12). The method time below is a simple fold operation on 
the given sequence of commands, based on that explanation. 
RobotUniExprAddExtRobotTime = (Robot l±) ExprAdd) <3 
RobotTime. 

object syntax {//rueaert 

import robot_uni_expr_add.syntax._ 

} 

object semantics {//rueaert 
import expr_add.semantics._ 
import robot.syntax.Commands 
import robot_uni_expr_add.syntax._ 
def time(cs: Commands): Int = (0 /: cs.s){ 
case (s, Left(e)) => s + value(e) 
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10 

case 

(s. 

Right(e)) 

=> s + value(e) 

11 

case 

(s. 

Up(e)) 

=> s + value(e) 

12 

case 

(S, 

Down(e)) 

=> s + value(e) 

13 

} 




14 

} 





Commands(Right(Num(5)), Up(Num(2)+ Num(10)), 
Up(Num(2)+ Num(2)+ Num(2)), Down(Num(4 ))) has 
the semantics x = 5, y = 14, t = 27 in rueaert. 

5) Unification with an Extension (<])}).• 

Take RobotUniExprMul = Robot l±) ExprMul, where 
ExprAdd <\ ExprMul. The language ExprMul extends 
ExprAdd by a new ADT case for multiplication (Mul). 
What is unique about ExprMul amongst the visited extension 
combinations is that, upon extension, it changes the syntactic 
categories of the ADT cases it borrows from ExprAdd. 
(And, in fact, it also provides a new syntactic category, i.e.. 
Factor.) As presented in Section III-B, this can impose 
a great deal of complexity when language extension is 
implemented using inheritance. Here is ExprMul (packaged 
under expr_mul). 


l 

import expr_add.syntax.{Expr, Term, Add} 

2 

object syntax {//expr_mul 

3 

//Term ::= Factor | ... 

4 

class Factor extends Term 

5 

implicit class TermTimesFactor (val t : Term) { 

6 

def * (f: Factor): Term = Mul(t, f) 

7 

}//Term ::= ... | Term * Factor 

8 

//Factor ::= n 

9 

case class Num(n: Int) extends Factor 

10 

case class Mul(left: Term, 

11 

right: Factor) extends Term 

13 

} 

object semantics {//expr_mul 

14 

import syntax._ 

15 

import expr_add. 

16 

ext_semantics.{value_ext => add_value} 

18 

def value_ext: (Expr, Expr => Int) => Int = { 

19 

case (Num(n), c) => n 

20 

case (Add(e, t) , c) => 

21 

add_value(Add(e, t) , c) 

22 

case (Mul(t, f), c) => c(t) * c(f) 

i 

24 

; 

def value(e: Expr): Int = value_ext(e, value) 

25 

} 


In line 1, ExprMul imports the syntactic entities it borrows 
from ExprAdd: the ADT case Add and the syntactic categories 
Expr and Term. It then introduces its new syntactic category 
Factor in line 4. Next, in lines 5 to 7, it provides the syntactic 
sugar for multiplication. Note how it, afterwards, declares 
numbers to now be of the category Factor - as opposed to 
Term in expr_add. syntax. The rest of expr_mul should 
be straightforward except for the Scala syntax of lines 15 
to 16. Those lines abbreviate expr_add. ext_semantics 
.value_ext to add_value in expr_mul.semantics. In 
line 21, expr_mul reuses add_value for the solo ADT case 
that it borrows from expr_add, i.e.. Add. 

1 object syntax {//robot_uni_expr_mul 

2 import robot.syntax.Command 

3 import expr_add.syntax.Expr 

4 import expr_mul.syntax 

5 

6 case class Left (e: Expr) extends Command 

7 case class Right(e: Expr) extends Command 

8 case class Up (e: Expr) extends Command 

9 case class Down (e: Expr) extends Command 

10 } 


object semantics {//robot_uni_expr_mul 
import robot.syntax.Commands 
import syntax._ 

import robot.semantics.position 
import expr_mul.semantics._ 

def locate (cs: Commands) = cs.s.foreach { 
case Left(e) => position.x -= value(e) 
case Right(e) => position.x += value(e) 
case Up(e) => position.y += value(e) 
case Down(e) => position.y -= value(e) 

} 

} 

The above implementation of RobotUniExprMul (packaged 
under robot_uni_expr_mul) takes tightly after RobotUni- 
ExprAdd (in Section II-C). We, therefore, do not provide 
a dedicated walk-through. Commands (Right (Num (5) * Num 
(2)), Down (Num(4 )+ Num(2)* Num(3))) has the seman¬ 
tics x = 10,y = — 10 in robot_uni_expr_mul. 

F. Language Specific? 

To investigate the extent to which Scala-specific language 
features impact upon our design, we intend also to com¬ 
pare against realisations in other languages. To this end, we 
have prepared a C++ implementation which adopts the Scala 
approach outlined so far. Respecting the dynamic polymor¬ 
phism of the Scala original, the C++ implementation utilises 
shared_ptr smart pointer to manage the memory allocation 
and runtime typing of expressions; allowing the vector 

container member object of the Commands class to store 
different expression types. User-defined integral and string 
literals also allow a notably concise syntax for the Num and Var 
instantiations; e.g.. Commands {Right { "a"_s }, Up{2_n + 
"b"_s} , Down{4_n }}. Future work will explore this further. 

Note that we are keen in the solution of this section not 
to employ Scala’s built-in open recursion. Due to unrelated 
reasons, however, Scala compilers might still employ open 
recursion internally to compile our code. Nonetheless, our 
code does not require that Scala idiosyncrasy. Testimony to 
that lack of requirement is our C++ code. Note that whilst 
open recursion is automatic in Scala, in C++, one needs to 
explicitly use “tnis- >” for the late-binding of open recursion. 

III. LMS-Based 

Rompf and Odersky [2] coin Lightweight Modular Staging 
(LMS) for Polymorphic Embedding [15] of DSLs in Scala. 
They employ a fruitful combination of the Scala features 
detailed in [16] that, as a side-product, offers a very simple 
yet effective solution to EP. In this paper, we use LMS for 
that EP solution. The essence of LMS is the use of Scala 
traits for extensibility and super calls for reuse. With their 
mixin nature, Scala traits can extend one another, enjoying the 
benefits of inheritance. In particular, an ADT can be inherited 
upon trait extension. But, the heir trait can also add its own 
new ADT cases. On top of that, super calls enable reusing 
methods on the cases of the original ADT. Whereas the new 
cases can be handled by the same method, albeit overridden 
by the heir trait. 
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In the package eaud below (for ExprAddUniDec), for 
implementing both the syntax and semantics, traits are used 
- as opposed to objects in Section II. Instead of importing 
members from other languages, it now extends those other 
languages to acquire the same members via inheritance. In 
Scala terms, eaud.syntax is, for instance, said to be mixing 
in expr_add. syntax and dec. syntax, in line 1 below. 

In line 4, then, eaud. semantics overrides value. In 
line 5, it handles the new ADT case eaud. syntax introduces. 
All those other ADT cases that eaud inherits are, in line 6, 
relayed to the upper levels of inheritance. 

trait syntax extends expr_add.syntax with dec.syntax { 

... /* like eaud.syntax in Section II-E3 */ ...} 

trait semantics extends syntax with expr_add.semantics { 
override def value: Expr => Int = { 
case Var(n) => consts(n) 
case e => super . value(e)} ...} 

This is how LMS facilitates both simplicity and extensibility. 
(Note that we needed not to resort to value_ext.) 

LMS has been successfully employed for languages in a 
multitude of applications. For the benefits of LMS, the reader 
is invited to consult those works. Given that we did not come 
to observe new benefits, we will not get into that here. We 
rather dedicate this section to the difficulties we faced over 
employing LMS for embedded language composition. 

A. Minor Difficulties 

The two categories of minor difficulties we faced relate 
to language restriction (Section III-A1) and clashes occurred 
between names upon composition (Section III-A2). 

1) Language Restriction: Upon extension, the programmer 
is usually provided with no means for acting selectively on the 
members to be inherited. When mixing traits too, all the (pub¬ 
lic or protected) members get inherited automatically. Hence, 
with inheritance being the means for language composition, 
language restriction is not possible. That enforces import as 
the fallback. With the use of traits, the mechanics is, however, 
more involved than Section II. Because traits are abstract, one 
needs to materialise them first (line 2 below), and only then, 
they can be imported from (line 3). 

trait syntax/* robot_positive */{ 
val robosyn = new robot.syntax {} 
import robosyn.{Right, Up, Command, Commands} 

} 

Even though LISA also employs inheritance for language 
composition, this difficulty does not arise there. The reason 
is as follows: Being also an AG system, (subject) language 
semantics is specified in LISA by traversing the concrete 
syntax. On the other hand, leveraging its OOP, LISA allows 
the heir language to override the parent language’s concrete 
syntax. As a result, language restriction is also possible in 
LISA via inheritance. 

One final related comment: In our experience, enforced 
imports like those required for language restriction were not 
exclusive to that way of language composition. In fact, in 
a good number of other occasions, the languages do make 
selective use of one another. That, on its own, was not a knotty 


problem. It, however, requires increasingly more care when 
it comes to interplay with hierarchies of languages and the 
relevant Scala mixins. 

Note that imported names (like those in line 3 above) 
do not get inherited but the respective materialised traits 
(like robosyn in line 2 above) do. Such imports can be 
required on several occasions down the hierarchy. In the case 
of unification, however, where the multiple inheritance nature 
of mixins is employed, an extra override might also be 
enforced to disambiguate duplicated names across the meeting 
two hierarchies. See Section III-B for more. 

2) Name Clash: Recall from Section II-E3 that 
RobotExtExprAddUniDec = Robot < (ExprAdd l±) Dec). In 
an LMS-based implementation of RobotExtExprAddUniDec, 
therefore, one would naturally want to implement rueaud. 
semantics as follows: 

trait semantics extends rueaud.syntax with 

robot.semantics with eaud.semantics {//rueaud 
... /* locate like Section II-E1 */ ...} 

That is, however, not possible. The error message is: “object 
Left is not a case class, nor does it have an unapply 
/unapplySeq member.” The problem is that, even though 
Left is inherited from robot, in locate, Scala would not be 
able to match it using the syntax Left (e) . The available con¬ 
structor and extractor of Left take no arguments. Moreover, 
overloading that syntax is not possible. This is because Scala 
desugars both case classes and case objects to objects with 
unapply (or unapplySeq) methods. Objects, on the other 
hand, are final, banning any later manipulation. To proceed, 
one needs to use robot_uni_expr_add. semantics in re¬ 
turn of robot.semantics. 

The problem is harder to diagnose for RobotUni- 
ExprAddExtRobotTime. Recall from Section II-E4 that 
RobotUniExprAddExtRobotTime = (Robot l±) ExprAdd) < 
RobotTime. For the attempt 


trait semantics extends 

rueaert 

.syntax 

with 

robot. 

_uni_expr_add.semantics 

with 


robot. 

_time.semantics { 

i... /* 

rueaert 

*/ ...i 


even when one employs robot_uni_expr_add. semantics 
instead of robot. semantics, one gets an error - this time, 
regarding the composition itself: “overriding object Left in 
trait syntax; object Left in trait syntax cannot override final 
member.” The problem here is with robot_time being an 
extension to robot, bringing the case object Left into the mix 
with that of robot_uni_expr_add that takes an argument. 

B. Major Difficulties 

The difficulties we spoke about in the previous subsection 
were not particularly acute in that not many circumvention 
attempts would fail for them. In this section, we will report 
a multi-staged combat with an acute difficulty we faced. In 
short, the combat was against the combination of Scala’s path- 
dependant typing and intervention of concrete syntax. 

The contents of this section might look too specific to Scala. 
They are not. Scala’s path-dependant typing is just one way to 
foster family polymorphism [ 17] (as opposed to lightweight 






family polymorphism [18]). The familiar reader will figure 
out that the same problem is likely to emerge in every host 
language that embraces family polymorphism. 

Given that ExprMul is a direct extension to ExprAdd, one’s 
first guess would be: 

trait expr_mul.syntax extends expr_add.syntax {...} 

That is, however, not possible because, then, Num cannot be 
overridden. Recall from Section II-E5 that ExprMul changes 
the syntactic category of Num. But, even an attempt like those 
in Section III-A1 for the syntax 

trait syntax { 

val easyn = new expr_add.syntax {}//expr_mul 

import easyn.{Expr, Term, Add} /* Num, Factor, etc. */ 

} 

would still cause failure for the semantics. 

trait expr_mul.semantics extends syntax with 

expr_add.semantics {...} 

Here is the error message: “overriding object Num in 
trait syntax; object Num in trait syntax cannot override final 
member.” This is because of the clash between the Num of 
such a expr_mul.syntax and expr_add.semantics. See 
Section III-A2 for an explanation on similar error messages. 

Now, let us suppose for the sake of argument that the 
semantics too selectively imports the ADT cases: 

trait semantics {//expr_mul 

val emsyn = new expr_mul.syntax {} 
import emsyn.{Num, Mul, Factor} 
val easyn = new expr_add.syntax {} 
import easyn.{Expr, Add, Term} 

... /* value or value_ext here */ ... 

} 

Recall that ExprMul adds the ADT case Mul to ExprAdd. 
To reuse - a la LMS - the ExprAdd semantics whilst also 
handling the new ADT case, one may (mistakenly) try: 

override def value: Expr => Int = { 

case Mul(t, f) => value(t) * value(f) ... 

} 

But, that will not type-check because of path-dependant 
typing interference: Expr in value’s signature is different 
from Expr that Mul inherits from. Here is the error mes¬ 
sage for line 2 above: “constructor cannot be instantiated 
to expected type; found: semantics .this .emsyn.Mul re¬ 
quired: semantics . this . Expr.” Even worse: An attempt for 
reusing the semantics of the only ADT case that remains intact 
over the move from ExprAdd to ExprMul using value_ext 

trait semantics {...//expr_mul 

import easem.{value_ext => add_value} 
def value_ext: (Expr, Expr => Int) => Int = { 
case (Num(n), c) => n 

case (Add(e, t), c) => add_value(Add(e, t), c) 
case (Mul(t, f), c) => c(t) * c(f)} } 

will again fail due to path-dependant typing. The error message 
for line 5 above is: “type mismatch; found: semantics .this 
.easyn .Add required: semantics . this . easem. Expr.” 

Given that expr_mul. semantics is to reuse pattern 
matching of expr_add. semantics, the former is also bound 
to the types - here , ADT cases - of the latter. In order to 
prevent the path-dependant clashes, thus, the only way forward 


seems to be for both expr_mul. syntax and expr_mul. 
semantics to import types of expr_add. semantics. This 
is, of course, very unnatural for the former. 

trait syntax {//expr_mul 

val easem = new expr_add.semantics {} 
import easem.{Expr, Term, Add}; ...} 
trait semantics extends syntax {//expr_mul 

import easem.{Expr, Add, value_ext => add_value} 

def value_ext: (Expr, Expr => Int) => Int = { 
case (Num(n), c) => n 
case (a: Add, c) => add_value(a, c) 

} 

} 

Still, if not done craftily enough, path-dependant typing can 
be an impediment. Replacing the line 9 above with 

case (a @ Add (_, _) , c) => add_value (a, c) 
will fail to type-check because a is considered to be of type 
this.Add; whereas, add_value accepts an easem.Expr. 
The unsightly circumvention would be: 
case (a @ Add(_, _), c)=> add_value(a. 
asInstanceOf[easem.Expr], c.asInstanceOf [easem 
.Expr => Int]). 

We would like to remind that all the difficulties illus¬ 
trated in this section were only experienced in the presence 
of manipulation in the syntactic categories upon extension. 
Syntactic categories are often used for dealing with concrete 
syntax. Semantics, on the other hand, inputs abstract syntax. 
The following section presents a solution that disassociates 
concrete syntax from abstract syntax. It applies the LMS at 
the abstract syntax level, and, hence, independently of the 
concrete syntax that varies across languages. That design sets 
the different languages free on engineering their syntactic 
categorisation whilst enjoying the benefits of LMS. 

IV. Refactoring 

The previous two sections were developed as if the guest 
language implementer was not aware in advance of the next 
guest languages and the upcoming combinations. We also 
maintained a backward compatibility policy in that we did 
not touch the older languages as we proceeded. Refactoring, 
however, is common in everyday software development. 

Refactoring can have a variety of meanings, depending on 
the target and the methods used [19]. Here, we do not plan 
extensive refactoring. We only focus on duplicate elimination 
in the fashion of the extract superclass method [19, §12.6]. 
Fig. 1 lists a number of duplicates in Sections II—III. 

We notice that the method value is duplicate in expr_add, 
eaud, and expr_mul. More precisely, the ADT cases Num 
and Add - which are, basically, inherited from expr_add - 
are handled thrice in the codebase. As will be shown in this 
section, we gave value its own abstraction. 

We also notice that the method locate is present in two 
sets of language descriptions: in (i) robot and robot_posit 
-ive (when the four direction commands do not take 
arguments); and, in (ii) reeaud, robot_uni_expr_add, 
rueaud, and robot_uni_expr_mul (when the four direction 
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value: Expr => Int 

expr_add{Num, Add}, eaud{Num, Add, Var}, 

expr_mul{Num, Add, Mul} 
locate: Command => Unit (without e) 
robot{Right, Left, Up, Down}, 

robot_positive{Right, Up} 

locate (with (e)) 

in reeaud: EnvComm => Unit 

in robot_uni_expr_add: Commands => Unit 

in rueaud: EnvComm => Unit 

in robot_uni_expr_mul: Commands => Unit 

EnvComm 

reeaud, rueaud 

Fig. 1: Duplicate Entities in Sections II and III 


commands do take arguments). Each of those sets constitutes 
a candidate for refactoring. Finally, EnvComm is common 
between reeaud and rueaert - constituting yet another 
refactoring candidate. Although we have indeed refactored the 
candidates of this paragraph as well, we will not include their 
demonstration in this paper. The interested reader can look 
them up in our online codebase. 

Let us now focus on refactoring the first row of Fig. 1. 
(Refactoring the other rows of Fig. 1 is done similarly.) Here 
is a succinct summary of actions to be taken: The idea is 
a combination of LMS and Component-Based Mechanisation 
[20], [21], [13], We parameterise the ADT cases Num, Add, 
Var, and Mul by the language description and perform their 
semantics evaluation independently of the language descrip¬ 
tion. We pack the two former cases - namely, Num and Add 
that are common between all the items in the first row of 
Fig. 1 - together in a trait. Then, we extend that trait for 
Var and later for Mul, both a la LMS. Finally, the concrete 
language descriptions only get to mix the respective abstract 
descriptions. The elaboration follows. 

1 trait na_syntax { 

2 type E //E for Expr 

3 type N <: E //N for Num 

4 type A <: E //A for Add 

5 

6 def n_extr(n: N): Option[Int] 

7 def a_extr(a: A): Option[(E, E)] 

8 

9 object N {def unapply(n: N) = n_extr (n)} 

10 object A {def unapply(a: A) = a_extr(a)} 

11 } 

12 trait na_semantics extends na_syntax { 

13 def value: E => Int = { 

14 case N (n) => n 

15 case A (el, e2) => value(el) + value(e2) 

16 } 

17 } 

In the trait na_syntax above, the abstract type E (in line 2) 
is a language-independent representation for the expression 
type of a guest language. Such a guest language can be 
an item in row 1 of Fig. 1 or any similar language with 
integer arithmetics that at least contains integral literals and 
addition. Given that ADTs are implemented in Scala using 
plain inheritance, two more language-independent abstract 
types have been employed that are announced to be extending 
E. Those are N for Num and A for Add, in lines 3 and 4. 

Because N and A are supposed to later be instantiated to 


the respective cases of an ADT, they are expected to come 
with the Scala matching syntax, like those in lines 14 and 
15. The Scala machinery for enforcing availability of the 
desirable matching syntax requires a discipline in coding that 
is slightly tricky. The discipline involves, for each ADT case 
abstract type, inclusion of a same-named (singleton) object - 
called companion object - that ships, then, with an extractor, 
i.e., an unapply method of the right type signature. The 
actual duty of the extractor is relayed to an abstract method, 
to be enforced to every guest language that implements 
na_syntax. For N, for instance, that duty is on n_extr 
in line 6. The Scala signature of n_extr means that, if 
matching N succeeds, it would be initialising an argument of 
type int. All that wiring enables the method na_semantics 
.value to handle the semantics of Num and Add. 

trait nam_syntax extends na_syntax { 
type M <: E 

def m_extr(m: M) : Option[ (E, E)] 
object M {def unapply(m: M) = m_extr(m)} 

} 

trait nam_semantics extends nam_syntax with na_semantics { 
override def value: E => Int = { 

case M(el, e2) => value(el) * value(e2) 
case e => super .value(e) 

} 

} 

The trait nam_syntax adds the abstract type M (in line 2 
above), which corresponds to Mul. It also provides the Scala 
matching syntax in lines 3 and 4. The trait nam_semantics 

reuses (a la LMS) what is already implemented by 
na_semantics by performing a super call on the relevant 
ADT cases (line 9). 

trait expr_add.syntax extends na_syntax { 

/* ... like lines 2 to 8 of 

expr_add.syntax in Section II ... */ 

type E = Expr //Fix the ADT type, 

type N = Num //Fix the Num case, 

type A = Add //Fix the Add case. 

//And, fix the extractors. 

def n_extr(n: Num) = Num.unapply(n) 

def a_extr(a: Add) = Add.unapply(a) 

} 

trait expr_add.semantics extends 

expr_add.syntax with na_semantics 

In addition to working out the Section II concrete syntax, 
the trait expr_add. syntax above, now is required to provide 
evidence on it indeed having ADT cases for integral literals 
and addition. That, again involves some slightly tricky disci¬ 
pline consisting of two steps. First, in lines 4 to 6, the concrete 
counterparts for the abstract (ADT case) types in na_syntax 
are fixed. Second, in lines 8 and 9 the extractors promised to 
na_syntax are fixed. 

Recall from expr_add. syntax of Section II that Num and 
Add are both case classes. Scala actually desugars case classes 
to normal classes in addition to companion objects with the 
right-typed unapply methods. That is why we can use Num. 
unapply and Add.unapply off-the-shelf. 

Nothing more remains for expr_add. semantics to do 
except inheriting its (abstract and concrete) syntax from 
expr_add. syntax and its semantics from na_semantics. 

trait expr_mul.syntax extends nam_syntax { 
val easyn = new expr_add.syntax {} 
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import easyn.{Expr, Term, Add};... 

//like lines 4 to 11 of expr_mul.syntax in Section II... 
type E = Expr; type N = Num; type A = Add; type M = Mul 
def n_extr(n: Num) = Num.unapply(n) 
def a_extr(a: Add) = Add. unapply(a) 
def m_extr(m: Mul) = Mul.unapply(m)} 
trait expr_mul.semantics extends 

expr_mul.syntax with nam_semantics 

Implementing ExprMul, in this fashion, is similar, as 
demonstrated above. It only is that, like in Section III, our 
use of traits instead of objects in favour of LMS imposes 
instantiation of the trait expr_add. syntax (line 2) before 
importing the desirable concrete syntax items (line 3). 

Remarks 

na_semantics is similar to how one defines the semantics 
of Num and int using Modular Structural Operational Seman¬ 
tics (MSOS) [22], In MSOS, the semantics of a component is 
defined exclusively in terms of the relevant language elements 
- making it ignorant about all other language elements. 
na_semantics only concerns Num and int, and, is ignorant 
about other language elements. 7 < f>C'o [ 13] describes that as: 
“client na_semantics<F <| Int © Num>{... where F is 
the family parameter of nasemantics. In words, that reads: 
A family $ to be substituted for F needs at least to have 
components Int and Num (or their equivalents) in its mix. 

From another language theoretical viewpoint, na_syntax 
and na_semantics are both type classes [23], From that 
viewpoint, expr_add. syntax is an instance of na_syntax 
and expr_add. semantics is an instance of na_semantics. 
The evidence for the former is provided in lines 2 to 7 
in na_syntax. Interestingly, however, our encoding of type 
classes in Scala is not the common one [24], In particular, we 
do not prescribe the use of implicits. 

As also announced at the last paragraph of Section III, 
na_syntax and na_semantics (and also nam_syntax and 
nam_semantics) relate to the abstract syntax only. This is 
how they leverage FMS and yet do not suffer from the concrete 
syntactic anomalies discussed in Section III. Moreover, unlike 
Modular Reifiable Matching [25], the technique we presented 
in this section is not exclusively targeting two-level types 
[26], The reason is that our technique in this section fully 
disassociates concrete syntax from the abstract syntax so there 
no longer is an issue of levels in the types. FMS itself comes 
with no such separation either - suggesting the name abstract 
LMS for our technique. 

It is noteworthy that the disassociation of abstract and 
concrete syntax with the lack of the FMS anomalies discussed 
in Section III needs not specifically be a la FMS. The 
same impact can also be achieved using integration of a 
decentralised pattern matching [27], In the latter technique, 
the syntax is defined in terms of abstract syntax components. 
The concrete syntax in the latter technique is then defined on 
top of those syntax components. The difference is that the 
abstract FMS composes components (that correspond to ADT 
cases) additively [28, §17.3], whilst the latter technique would 
be composing them sequentially. 


The connection between this technique and Component- 
Based Software Engineering (CBSE) [28, §17], [29, §10] is 
also interesting. From a CBSE standpoint, nam_syntax is 
a component in that: Without binding to a particular imple¬ 
mentation, it specifies its so-called ‘requires’ and ‘provides’ 
interfaces. The nam_syntax ‘requires’ interface is its lines 2 
and 3 - imposing the following two requirements, respectively: 
The user of nam_syntax needs to provide a type M. And, 
there has to be a way to extract two expressions of type E 

from an instance of M. In return, the ‘provides’ interface 
of M is its line 4, where m’s Scala match syntax (used in 
line 8 of nam_semantics) is offered. As such, nam_syntax 
is promoting the ADT case Mul to its standalone component. 1 
This is an important characteristic of the third technique that 
relates to the EP. Next section is dedicated to that relationship. 

V. Expression Problem 

EP is a recurrent problem in the field of Programming 
Fanguages, for which a wide range of solutions have thus 
far been proposed, e.g., [31], [32], [33], Consider [34], [35], 
[36], [31], [32], [33], to name a few. Haeri [21] defines EP 
as the challenge of finding an implementation for an ADT - 
defined by its cases and the functions on it - that: 

El. is extensible in both dimensions , i.e., both new cases and 
functions can be added. 

E2. provides weak static type safety , i.e., applying a function 
/ on a statically 2 constructed ADT term t should fail to 
compile when / does not cover all the cases in t. 

E3. upon extension, forces no manipulation or duplication 
to the existing code. 

E4. accommodates separate compilation, i.e., compiling the 
extension imposes no requirement for repeating com¬ 
pilation or type checking of existing code. Such static 
checks should not be deferred to the link or run time. 

In Sections II-IV, we presented three techniques for embed¬ 
ded language composition in Scala. All the three techniques 
satisfy E4. We now reflect on their E1-E3 competence: The 
first technique clearly satisfies El. Section III-A2 outlines a 
scenario where LMS fails to satisfy El. Whether the third 
technique satisfies El depends on whether it employs trait 
mixing for composition or not. Note that it needs not. The three 
techniques all relax E2, although they can be circumvented to 
work when defaults are available [35], That is a consequence 
of Scala performing pattern matching at runtime. LMS too 
relaxes E2 and that has thus far been considered an acceptable 
setting. (For example, MVCs [37] and Torgersen’s second 
solution [34] both have the same issue.) The state of affairs 
for LMS might change in future though [38], 

As witnessed by RobotUniExprAddUniDec in Sect¬ 
ion II-E1, the Scala-unspecific technique fails to satisfy E3 

1 Two reasons for not promoting Num and Add to components: 1) that 
would complicate presentation. 2) the current design in which those two 
ADT cases are packed together in a single component (i.e., na_syntax) 
demonstrates how to address the Common Reuse Principle of Martin [30], 

2 If the guarantee was for dynamically constructed terms too, we would 
have called it strong static type safety. 



when new cases are to be added. As detailed in Section III-B, 
LMS has to fight path-dependant typing to satisfy E3 when 
syntactic categories are updated upon composition. Whether 
there always is a winning strategy for LMS in such a situation 
is not known. The third technique clearly satisfies E3. 

We understand that the path-dependant typing difficulties 
of the LMS-based technique might indeed be a result of our 
peculiar design. In particular, our choice of giving the syntax 
and semantics of a language each a trait of their own might 
be picked as the root cause. We would like to defend that 
choice of ours, specifically, for the likelihood of engineering 
(or experimentation with) more than one semantics for the 
same syntax [15]. In such cases, separation of the syntax and 
semantics is inevitable. 

Finally, one may wonder whether the third technique makes 
it to a new solution to EP. The answer is indeed yes. At 
least for EP in presence of defaults [35]. This is the third 
EP solution of its kind: It promotes ADT cases to their own 
ADT-parameterised components. See [20], [21] for the first 
and [27] for the second EP solution of this kind. 

VI. Related Work 

a) LISA: As stated earlier, this paper is highly inspired 
by Mernik [6], We essentially took his examples for showing 
how to compose languages embedded in Scala. With LISA 
being an LDF, even though Scala is famous for its hospitality 
to embedded languages, we were surprised to end up having 
less lines-of-code (LoC) in all the three techniques. 

Fig. 2 summarises the LoC comparison. In the LoC there, 
we have also included some syntactic cosmetics that we did 
not display in this paper. In our experience, the occasions 
where Scala outperforms LISA by far are those where the 
task was a ready cake for GPLs. Examples are RobotTime for 
all the techniques and RobotExtExprAddUniDec for the third 
technique. For the former, a simple container size query does 
the job. For the latter, simple trait mixing does. 

The first technique generally performs better (in terms of 
LoC) than LISA. The second is even better usually with its 
utilisation of trait mixing (dismissing the obvious import 
s) and super calls. At last, the third is the best with its a 
posteriori refactoring. The two occasions when LISA con¬ 
siderably outperforms Scala are RobotUniExprAdd for the 
first technique and RobotUniExprMul for the third. Those 
correspond to Sections III-A2 and III-B, respectively. 

The factored out code in the third technique is not counted 
in Fig. 2. Once that too is added, the total LoC reaches 328 
- which is 2 more than first technique’s LoC. We tend to 
think the reason is the simplicity in the semantics of Mernik’s 
examples. That caused the number of lines the refactoring 
saves to be less than the extra overhead the technique requires. 
For more realistic case studies, we expect the balance to 
be completely different. That would be well in favour of 
refactoring due to reasonably more involved semantics. 

b) Other Language Composition Catalogues: Volter [39] 
proposes a taxonomy of language composition that he show¬ 
cases in JetBrains MPS. His taxonomy is along axes, not all 


of which having a clear correspondent in the work of Erdweg 
et al. As explained by Mernik, the resulting ways for language 
composition that Volter prescribes, however, are subsumed by 
the latter taxonomy. Volter’s taxonomy gives (syntax-oriented) 
IDE development for languages a higher weight. 

Barrett, Bolz, and Tratt [40] catalogue composition of six 
different Python and Prolog virtual machines. Their study has 
a particular focus on measuring performance of the resulting 
interpreters upon composition. 

Zhang et al. [41] facilitate composition of languages that 
are embedded using Object Algebras [42]. This is achieved 
using their simple predesignated annotation. Their showcase 
focuses on hierarchies of language extension. Using linearised 
multiple language inheritance, they also simulate a single 
language unification. Zhang et al. do not consider higher order 
composition. 

Melange [43] is an LDF that is specially equipped for 
language composition. Various syntactic facilities are available 
in Melange to instruct mix-and-match for many different 
aspects of a language - ranging from syntax, dynamic and 
static semantics, and name-binding to IDE features. Language 
composition under Melange is catalogued for a small set of 
showcases but with in-length discussions on customisability. 
The current documentation of Melange, however, makes it 
hard for us to compare its catalogue of language composition 
with similar works. Specifically, we fail to figure out which 
ways for language composition Melange supports in general 
(namely, for other scenarios than the ones already in their 
documentation) and how. 

c) Components for Language Specification: PLanCompS 
funcons are syntactic constructs that ship with their own 
fixed static and dynamic semantics (presented in MSOS). 
The PLanCompS specification of a programming language 
is developed by merely assembling funcons [44], Example 
assemblies are larger academic languages [45] and medium- 
scale ones [46]. Despite their merit, funcons do not constitute 
CBSE components. In particular, funcons do not ship with 
their ‘requires’ interfaces. 

MVCs [37] are components for solving an extension to EP. 
Rather than components in their CBSE sense, however, MVCs 
are components in a Component-Oriented Programming [47] 
sense. (Cf. [21, §4.3].) MVCs rely on the implementation 
details of how a component realises its interfaces. CBSE 
components, in contrast, are identified by their ‘requires’ and 
‘provides’ interfaces. 

Haeri and Schupp [20], [27] take a CBSE approach for 
the implementation of embedded languages. Their approach 
employs type constraints and multiple inheritance. The third 
technique here employs (possibly constrained) abstract types 
instead of type parameters. Although essentially the same, the 
former can make code terser. In Scala, however, offering the 
match syntax is apparently not possible for type parameters. 

Finally, Cazzola and Vacchi [48] too have taken a CBSE 
approach. Their components correspond to a DSL’s compiler 
passes. Accordingly, how their work relates to the common 
language specification formalisms is not clear. In contrast. 
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Columns : L\ = Robot, L2 = RobotTime, L3 = RobotPositive, L4 = ExprAdd, L5 = RobotUniExprAdd, Lq = Dec, L7 = RobotUniExprAddUniDec, 
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RobotUniExprMul Rows: LISA = Memik’s Implementation, T, = Technique i. for i 6 {1,2,3} _ 

Fig. 2: Lines-of-Code Comparison between Memik’s LISA and Our Three Techniques 


components in our third technique are ADT cases - acting 
as the unit of study for formal semantics. 

d) Component-Based AGs: AGs are a powerful means 
for language specification with many benefits that are well- 
studied. Attempts to modularise AGs go back to Saraiva and 
Swierstra [49]. Saraiva’s Higher Order AGs (HOAGs) [50] 
were the initial steps towards using AGs in a component- 
based fashion. Viera and Swierstra [51] formally define several 
ways to combine HOAGs. However, those ways do not tightly 
correspond to the usual composition mechanics of general- 
purpose languages. 

So long as EP is concerned, the correct behaviour of a 
HOAG w.r.t. E2 is not universally agreed upon. In terms of 
HOAGs, that amounts to the absence of an attribute expected 
from another component in the mix. In particular, should the 
code then fail statically or dynamically? Zipper functions [52], 
[53] act like Haskell by statically reporting such errors so 
long as they can be caught iteratively [54], 

Kiama [55] uses AGs embedded in Scala for language 
specification. It is possible to use Kiama in a component-based 
fashion - as done for embedding Oberon-O [56] in Scala [57], 
However, disassociation of the concrete and abstract syntax 
can become non-trivial in Kiama. We anticipate that would 
cause similar difficulties to those we faced over our second 
technique. For the Oberon-O embedding, facing such diffi¬ 
culties were unlikely for the different pieces of syntax were 
all available in advance. On the contrary, whilst composing 
unrelated pieces of syntax, clash of concrete syntax is likely. 

VII. Conclusions and Future Work 

In this paper we present three different techniques for 
composing languages embedded in Scala. The first is Scala- 
unspecific and works in presence of common module systems 
and higher order functions (Section II). The second is LMS- 
based and requires mixin composition and super calls (Sec¬ 
tion III). The third works by promoting ADT cases to ADT- 
parameterised components (Section IV). We showcase the 
three techniques using the example compositions of Mernik, 
which, in return, were designed to exhibit LISA’s composition 
facilities for Erdweg et al.’s taxonomy of composition. We 
manifest the strengths and weaknesses of each technique. We 
compare them according to their performance as EP solutions 
(Section V) and LoC (Section Vl-Oa). 

Systematic study of embedded language composition is 
a young topic. Numerous paths exist for future research. 


Examining our third technique against larger testcases is 
an immediate future work. A promising candidate is the 
LDTA’ll challenge of modular implementation of Oberon- 
0. The testcase can then be compared with the LDTA’ll 
contestants. We anticipate complications in dealing with a few 
issues along the way: Firstly, the technique takes a design-by- 
contract approach on the names it chooses for abstract types, 
e.g., A and N in na_syntax. In large scale, these names are 
likely to clash upon composition. Avoiding that would imply a 
priori knowledge. That kind of knowledge is, however, rare in 
experimental language design. Secondly, outside lab settings, 
usual software engineering techniques may become inevitable. 
We took the lab liberty of not being concerned with that here. 
For example, position and consts lack proper scoping and 
are common intact amongst all the descendants of Robot and 
Dec, respectively. 

Type classes are more widely practised in Haskell. It 
would be interesting to see our third technique in Haskell 
with its type classes instead of Scala’s mixins and inheritance. 
The comparison between the results of ours and those ac¬ 
cording to the following two HASKELL EP solutions would 
be particularly interesting: Data Types a la Carte [36] and 
Parametric Compositional Datatypes [32], 

Object Algebras are gaining gravity as a powerful abstrac¬ 
tion for embedded language development [31], [58], [59], 
[41], The current technology for embedding Object Algebras, 
however, is heavyweight in both term creation [60] and algebra 
composition. It is easy to turn na_syntax and the like into 
Object Algebra Interfaces to lower those two weights. How 
useful the result would be in lowering those two weights in 
the current Object Algebras technology is another future work. 

Finally, it is important to also produce catalogues like this 
paper in other host languages than Scala. Many languages 
have merits in hosting other languages. But, the limits of 
that and the key factors of it are not clear. Composition of 
the embedded languages is certainly amongst the important 
factors. A head-to-head comparison on hospitality of language 
composition is missing. We are currently working on that. 
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